Programmable and Scalable Architecture for Graphics Processing Units
نویسندگان
چکیده
Graphics processing is an application area with high level of parallelism at the data level and at the task level. Therefore, graphics processing units (GPU) are often implemented as multiprocessing systems with high performance floating point processing and application specific hardware stages for maximizing the graphics throughput. In this paper we evaluate the suitability of Transport Triggered Architectures (TTA) as a basis for implementing GPUs. TTA improves scalability over the traditional VLIW-style architectures making it interesting for computationally intensive applications. We show that TTA provides high floating point processing performance while allowing more programming freedom than vector processors. Finally, one of the main features of the presented TTA-based GPU design is its fully programmable architecture making it suitable target for general purpose computing on GPU APIs which have become popular
منابع مشابه
Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)
Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...
متن کاملA Scalable and Reconfigurable Shared-Memory Graphics Cluster Architecture
If the computational demands of an interactive graphics rendering application cannot be met by a single commodity Graphics Processing Unit (GPU), multiple graphics accelerators may be utilised on multi-GPU based systems such as SLI [1] or Crossfire [2] or by a cluster of PCs in conjunction with a software infrastructure. Typically these PC cluster solutions allow the application programmer to u...
متن کاملImage and Video Processing on CUDA: State of the Art and Future Directions
In the last few years a myriad of computer graphic applications have been developed using standard programming techniques, which are mainly based on multicore general-purpose processors (CPUs) architectures. Due to the rapid turning towards high definition multimedia, more and more researches have been done that need both computational resources and memory space to achieve high performance. To ...
متن کاملEfficient Image Processing Using Reaction- Diffusion Cnn Implemented in Cuda Technology
This paper explores an implementation model for speeding-up the execution time for the highly computational model of the reaction-diffusion CNN (RD-CNN) described in [1]. RD-CNNs as well as standard CNNs are computing intensive, and this is a limiting factor to explore its full potential especially for image processing tasks. Hardware implementations using VLSI or FPGA architectures can provide...
متن کاملAccelerating radio astronomy cross-correlation with graphics processing units
We present a highly parallel implementation of the cross-correlation of timeseries data using graphics processing units (GPUs), which is scalable to hundreds of independent inputs and suitable for the processing of signals from “Large-N” arrays of many radio antennas. The computational part of the algorithm, the X-engine, is implementated efficiently on Nvidia’s Fermi architecture, sustaining u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009